练习:《An Introduction to Statistical Learning, with Applications in R》Section 2.4 Excercises: 1, 3, 8, 10

1,

image.png

image.png

我们按照上图回归模型中提及的test测试集上的MSE的bias-var分解为依据,

然后我们一般认为是更灵活的方法,它的方差会比较大,但是偏差会比较小,即In general, more fexible statistical methods have higher variance.

总之不可约误差,也就是误差的方差我们一般是不可控的,所以我们只考虑关于偏差以及方差的影响因素;

或者我们这么来想,因为更灵活的方法偏差可以控制得比较小,然后不可约误差我们一般没法改变,所以我们可以直接考虑方差的影响,如果方差可以控制得小,我们可以选择更灵活的方法,反之需要考虑不灵活的方法

(1) 样本尺寸很大,但是自变量也就是feature(输入变量)数目很少:

因为灵活模型的偏差比较小,然后p也很少,也就是需要拟合的参数比较少(需要估计的参数比较少),然后样本尺寸更大的话,此处参考下面的模型预测值的方差公式

image.png

我们可以认为更加灵活的模型的,随着样本量的增加,其方差的影响是能够被控制住的,所以我们还是选择更加灵活的模型,也就是更灵活的模型会有更好的表现

(2) 观测的样本数目很少,但是自变量也就是输入变量的数目很多:

这道题就是和上面的题目反着来了,更加灵活的模型可以考虑到偏差,但是方差无法控制,而且样本数目太少了,很容易过拟合,导致高方差;如果是不灵活的方法,可以控制住方差,还是选择不灵活的方法; 也就是更灵活的模型可能会有更差的表现

(3) 因变量和自变量之间的关系是高度非线性的:

高度非线性,就需要复杂的拟合,选择更加灵活的模型,可以减少偏差,至少不灵活的模型很难做到;更灵活的模型会有更好的表现

(4) 不可约误差项很高:

这个是不可控的,所以无法判断偏差或方差的影响,模型选择无法判断;但总体来说,更灵活的模型偏差会比较小,但是方差会比较大,当然偏差都可以减小,所以更灵活的模型可能表现会比较差

2,

image.png

因为是提供草图,所以只要形似即可,当然也可以提供模拟数据进行演示,

然后我们需要提供的指标是偏差bias,方差,训练集以及测试集的误差,以及贝叶斯误差曲线(应该是分类问题中,具有理论最低错分率,当然就是不可约误差最低的那一项)

首先训练误差以及测试误差我们可以形式仿照ppt上的:

image.png image-2.png image-3.png

然后解释就是随着模型的灵活性增加,也就是模型越来越复杂,我们可以很轻易的在训练集上减小偏差,但是随之而来的,会导致数据的过拟合,所以一般测试集的error,此处我们使用MSE一般是U形的,也就是先降低再升高;然后训练集上就是单调的减小

再然后就是bias-var分解的曲线我们也可以借鉴:

image-4.png

总之我们的草图曲线绘制如下:

20250304-111641.jpg

解释如下:

首先是train error,随着模型复杂度,灵活性的增加,我们可以轻易减少偏差,达到在训练集上error的减少,主要是训练集上模型能够完美拟合训练数据,所以训练误差会继续下降,甚至接近于0;

然后test error,同样的,一开始是偏差减少,方差增加,但是偏差减少程度更大,所以会使曲线error下降,但紧接着随着模型越来越灵活,我们的数据在方差上的增加程度会超过偏差的减少程度,包括过拟合现象等,可以看做是bias-var的一个平衡tradeoff;

至于bias,也就是偏差,我们前面就说了,随着模型的复杂度的增加,更加灵活,偏差是可以持续下降的,实际上就是模型预测值与真实值之间的差;

然后方差的话,因为更加灵活的模型,越容易导致过拟合,所以方差会持续增大;

至于不可约误差,因为是理论最低test的error,会是一条水平线,表示任何模型子啊给定数据固有噪声的情况下可以达到的最低误差率,不会随着方法灵活性的变化而变化

8,

image.png image-2.png image-3.png image-4.png image-5.png

In [ ]:
# a,b

# 注意,数据集需要到https://www.statlearning.com/resources-first-edition中下载
library(tidyverse)

# 读取数据
college <- read_csv("/data1/project/College.csv",col_names = TRUE)
college
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
New names:
• `` -> `...1`
Rows: 777 Columns: 19
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): ...1, Private
dbl (17): Apps, Accept, Enroll, Top10perc, Top25perc, F.Undergrad, P.Undergr...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
A spec_tbl_df: 777 × 19
...1PrivateAppsAcceptEnrollTop10percTop25percF.UndergradP.UndergradOutstateRoom.BoardBooksPersonalPhDTerminalS.F.Ratioperc.alumniExpendGrad.Rate
<chr><chr><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
Abilene Christian University Yes 1660 1232 7212352 2885 537 74403300450220070 7818.112 7041 60
Adelphi University Yes 2186 1924 5121629 26831227122806450750150029 3012.21610527 56
Adrian College Yes 1428 1097 3362250 1036 99112503750400116553 6612.930 8735 54
Agnes Scott College Yes 417 349 1376089 510 63129605450450 87592 97 7.73719016 59
Alaska Pacific University Yes 193 146 551644 249 869 75604120800150076 7211.9 210922 15
Albertson College Yes 587 479 1583862 678 41135003335500 67567 73 9.411 9727 55
Albertus Magnus College Yes 353 340 1031745 416 230132905720500150090 9311.526 8861 63
Albion College Yes 1899 1720 4893768 1594 32138684826450 8508910013.73711487 73
Albright College Yes 1038 839 2273063 973 306155954400300 50079 8411.32311644 80
Alderson-Broaddus College Yes 582 498 1722144 799 78104683380660180040 4111.515 8991 52
Alfred University Yes 1732 1425 4723775 1830 110165485406500 60082 8811.33110932 73
Allegheny College Yes 2652 1900 4844477 1707 44170804440400 60073 91 9.94111711 76
Allentown Coll. of St. Francis de SalesYes 1179 780 2903864 1130 638 96904785600100060 8413.321 7940 74
Alma College Yes 1267 1080 3854473 1306 28125724552400 40079 8715.332 9305 68
Alverno College Yes 494 313 1572346 13171235 83523640650244936 6911.126 8127 55
American International College Yes 1420 1093 220 922 1018 287 87004780450140078 8414.719 7355 69
Amherst College Yes 4302 992 4188396 1593 5197605300660159893 98 8.46321424100
Anderson University Yes 1216 908 4231940 1819 281101003520550110048 6112.114 7994 59
Andrews University Yes 1130 704 3221423 1586 326 99963090900132062 6611.51810908 46
Angelo State University No 3540 200110162454 41901512 51303592500200060 6223.1 5 4010 34
Antioch University Yes 713 661 2522544 712 23154763336400110069 8211.33542926 48
Appalachian State University No 7313 466419102063 99401035 68062540 96200083 9618.314 5854 70
Aquinas College Yes 619 516 2192051 1251 767112084124350161555 6512.725 6584 65
Arizona State University Main campus No 128091030837612449225937585 74344850700210088 9318.9 5 4602 48
Arkansas College (Lyon College) Yes 708 334 1664674 530 182 86443922500 80079 8812.62414579 54
Arkansas Tech University No 1734 1729 9511252 3602 939 34602650450100057 6019.6 5 4739 48
Assumption College Yes 2135 1700 4912359 1708 689120005920500 50093 9313.830 7100 88
Auburn University-Main Campus No 7548 679130702557162621716 63003933600190885 9116.718 6642 69
Augsburg College Yes 662 513 2571230 2074 726119024372540 95065 6512.831 7836 58
Augustana College IL Yes 1879 1658 4973669 1950 38133534173540 82178 8312.740 9220 71
⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮
Westfield State College No 31002150 825 3203234 941 554237885001300757915.720 422265
Westminster College MO Yes 662 553 1842043 665 371072040506001650667012.520 792562
Westminster College Yes 996 866 37729581411 72120653615430 685627812.541 859680
Westminster College of Salt Lake CityYes 917 720 2132160 979 743 882040506002025688310.534 717050
Westmont College No 950 713 35142721276 91432053044901410777714.917 883787
Wheaton College IL Yes 1432 920 54856842200 561148042005301400818312.7401191685
Westminster College PA Yes 17381373 41721551335 30184605970700 850929613.2412270471
Wheeling Jesuit College Yes 903 755 2131549 971 305105004545600 600667114.127 749472
Whitman College Yes 1861 998 35945771220 46166704900750 800808310.5511319872
Whittier College Yes 16811069 34435631235 301624956995001998849213.6291177852
Whitworth College Yes 1121 926 37243701270 1601266045006782424808016.920 832880
Widener University Yes 21391492 5022464218621711235053705001350888612.619 960363
Wilkes University Yes 16311431 43415361803 6031115051305501260789213.324 854367
Willamette University Yes 16581327 39549801595 159148004620400 790919413.3371077968
William Jewell College Yes 663 547 31532671279 751006029705002600748011.219 788559
William Woods University Yes 469 435 2271739 851 1201053543655503700396612.916 743852
Williams College Yes 41861245 52681961988 2919629579050012009499 9.0642201499
Wilson College Yes 167 130 461650 199 676114285084450 4756776 8.3431029167
Wingate College Yes 12391017 38310341207 157 782034005501550698113.9 8 726491
Winona State University No 33252047130120455800 872 420027003001200536020.218 531858
Winthrop University No 23201805 76924613395 670 640033925802150718012.826 672959
Wisconsin Lutheran College Yes 152 128 751741 282 22 9100370050014004848 8.526 896050
Wittenberg University Yes 19791739 57542681980 144159484404400 800829512.8291041478
Wofford College Yes 1501 935 27351831059 341268041506051440919215.342 787575
Worcester Polytechnic Institute Yes 27682314 68249862802 86158845370530 730929415.2341077482
Worcester State College No 21971515 543 42630892029 679739005001200606021.014 446940
Xavier University Yes 19591805 6952447284911071152049606001250737513.331 918983
Xavier University of Louisiana Yes 20971915 69534612793 166 69004200617 781677514.420 832349
Yale University Yes107052453131795995217 8319840651063021159696 5.8494038699
York College of Pennsylvania Yes 29891855 691286329881726 499035605001250757518.128 450999
In [ ]:
# 我们首先来查看一些这些数据的基本信息

# 维度
dim(college)  # 777x19,也就是777行以及19列

rownames(x = college)  # 行索引index,我们需要查看第1列

colnames(x = college)  # 列名names,发现是'...1'

college[,'...1'] #这些学校
  1. 777
  2. 19
  1. '1'
  2. '2'
  3. '3'
  4. '4'
  5. '5'
  6. '6'
  7. '7'
  8. '8'
  9. '9'
  10. '10'
  11. '11'
  12. '12'
  13. '13'
  14. '14'
  15. '15'
  16. '16'
  17. '17'
  18. '18'
  19. '19'
  20. '20'
  21. '21'
  22. '22'
  23. '23'
  24. '24'
  25. '25'
  26. '26'
  27. '27'
  28. '28'
  29. '29'
  30. '30'
  31. '31'
  32. '32'
  33. '33'
  34. '34'
  35. '35'
  36. '36'
  37. '37'
  38. '38'
  39. '39'
  40. '40'
  41. '41'
  42. '42'
  43. '43'
  44. '44'
  45. '45'
  46. '46'
  47. '47'
  48. '48'
  49. '49'
  50. '50'
  51. '51'
  52. '52'
  53. '53'
  54. '54'
  55. '55'
  56. '56'
  57. '57'
  58. '58'
  59. '59'
  60. '60'
  61. '61'
  62. '62'
  63. '63'
  64. '64'
  65. '65'
  66. '66'
  67. '67'
  68. '68'
  69. '69'
  70. '70'
  71. '71'
  72. '72'
  73. '73'
  74. '74'
  75. '75'
  76. '76'
  77. '77'
  78. '78'
  79. '79'
  80. '80'
  81. '81'
  82. '82'
  83. '83'
  84. '84'
  85. '85'
  86. '86'
  87. '87'
  88. '88'
  89. '89'
  90. '90'
  91. '91'
  92. '92'
  93. '93'
  94. '94'
  95. '95'
  96. '96'
  97. '97'
  98. '98'
  99. '99'
  100. '100'
  101. '101'
  102. '102'
  103. '103'
  104. '104'
  105. '105'
  106. '106'
  107. '107'
  108. '108'
  109. '109'
  110. '110'
  111. '111'
  112. '112'
  113. '113'
  114. '114'
  115. '115'
  116. '116'
  117. '117'
  118. '118'
  119. '119'
  120. '120'
  121. '121'
  122. '122'
  123. '123'
  124. '124'
  125. '125'
  126. '126'
  127. '127'
  128. '128'
  129. '129'
  130. '130'
  131. '131'
  132. '132'
  133. '133'
  134. '134'
  135. '135'
  136. '136'
  137. '137'
  138. '138'
  139. '139'
  140. '140'
  141. '141'
  142. '142'
  143. '143'
  144. '144'
  145. '145'
  146. '146'
  147. '147'
  148. '148'
  149. '149'
  150. '150'
  151. '151'
  152. '152'
  153. '153'
  154. '154'
  155. '155'
  156. '156'
  157. '157'
  158. '158'
  159. '159'
  160. '160'
  161. '161'
  162. '162'
  163. '163'
  164. '164'
  165. '165'
  166. '166'
  167. '167'
  168. '168'
  169. '169'
  170. '170'
  171. '171'
  172. '172'
  173. '173'
  174. '174'
  175. '175'
  176. '176'
  177. '177'
  178. '178'
  179. '179'
  180. '180'
  181. '181'
  182. '182'
  183. '183'
  184. '184'
  185. '185'
  186. '186'
  187. '187'
  188. '188'
  189. '189'
  190. '190'
  191. '191'
  192. '192'
  193. '193'
  194. '194'
  195. '195'
  196. '196'
  197. '197'
  198. '198'
  199. '199'
  200. '200'
  201. ⋯
  202. '578'
  203. '579'
  204. '580'
  205. '581'
  206. '582'
  207. '583'
  208. '584'
  209. '585'
  210. '586'
  211. '587'
  212. '588'
  213. '589'
  214. '590'
  215. '591'
  216. '592'
  217. '593'
  218. '594'
  219. '595'
  220. '596'
  221. '597'
  222. '598'
  223. '599'
  224. '600'
  225. '601'
  226. '602'
  227. '603'
  228. '604'
  229. '605'
  230. '606'
  231. '607'
  232. '608'
  233. '609'
  234. '610'
  235. '611'
  236. '612'
  237. '613'
  238. '614'
  239. '615'
  240. '616'
  241. '617'
  242. '618'
  243. '619'
  244. '620'
  245. '621'
  246. '622'
  247. '623'
  248. '624'
  249. '625'
  250. '626'
  251. '627'
  252. '628'
  253. '629'
  254. '630'
  255. '631'
  256. '632'
  257. '633'
  258. '634'
  259. '635'
  260. '636'
  261. '637'
  262. '638'
  263. '639'
  264. '640'
  265. '641'
  266. '642'
  267. '643'
  268. '644'
  269. '645'
  270. '646'
  271. '647'
  272. '648'
  273. '649'
  274. '650'
  275. '651'
  276. '652'
  277. '653'
  278. '654'
  279. '655'
  280. '656'
  281. '657'
  282. '658'
  283. '659'
  284. '660'
  285. '661'
  286. '662'
  287. '663'
  288. '664'
  289. '665'
  290. '666'
  291. '667'
  292. '668'
  293. '669'
  294. '670'
  295. '671'
  296. '672'
  297. '673'
  298. '674'
  299. '675'
  300. '676'
  301. '677'
  302. '678'
  303. '679'
  304. '680'
  305. '681'
  306. '682'
  307. '683'
  308. '684'
  309. '685'
  310. '686'
  311. '687'
  312. '688'
  313. '689'
  314. '690'
  315. '691'
  316. '692'
  317. '693'
  318. '694'
  319. '695'
  320. '696'
  321. '697'
  322. '698'
  323. '699'
  324. '700'
  325. '701'
  326. '702'
  327. '703'
  328. '704'
  329. '705'
  330. '706'
  331. '707'
  332. '708'
  333. '709'
  334. '710'
  335. '711'
  336. '712'
  337. '713'
  338. '714'
  339. '715'
  340. '716'
  341. '717'
  342. '718'
  343. '719'
  344. '720'
  345. '721'
  346. '722'
  347. '723'
  348. '724'
  349. '725'
  350. '726'
  351. '727'
  352. '728'
  353. '729'
  354. '730'
  355. '731'
  356. '732'
  357. '733'
  358. '734'
  359. '735'
  360. '736'
  361. '737'
  362. '738'
  363. '739'
  364. '740'
  365. '741'
  366. '742'
  367. '743'
  368. '744'
  369. '745'
  370. '746'
  371. '747'
  372. '748'
  373. '749'
  374. '750'
  375. '751'
  376. '752'
  377. '753'
  378. '754'
  379. '755'
  380. '756'
  381. '757'
  382. '758'
  383. '759'
  384. '760'
  385. '761'
  386. '762'
  387. '763'
  388. '764'
  389. '765'
  390. '766'
  391. '767'
  392. '768'
  393. '769'
  394. '770'
  395. '771'
  396. '772'
  397. '773'
  398. '774'
  399. '775'
  400. '776'
  401. '777'
  1. '...1'
  2. 'Private'
  3. 'Apps'
  4. 'Accept'
  5. 'Enroll'
  6. 'Top10perc'
  7. 'Top25perc'
  8. 'F.Undergrad'
  9. 'P.Undergrad'
  10. 'Outstate'
  11. 'Room.Board'
  12. 'Books'
  13. 'Personal'
  14. 'PhD'
  15. 'Terminal'
  16. 'S.F.Ratio'
  17. 'perc.alumni'
  18. 'Expend'
  19. 'Grad.Rate'
A tibble: 777 × 1
...1
<chr>
Abilene Christian University
Adelphi University
Adrian College
Agnes Scott College
Alaska Pacific University
Albertson College
Albertus Magnus College
Albion College
Albright College
Alderson-Broaddus College
Alfred University
Allegheny College
Allentown Coll. of St. Francis de Sales
Alma College
Alverno College
American International College
Amherst College
Anderson University
Andrews University
Angelo State University
Antioch University
Appalachian State University
Aquinas College
Arizona State University Main campus
Arkansas College (Lyon College)
Arkansas Tech University
Assumption College
Auburn University-Main Campus
Augsburg College
Augustana College IL
⋮
Westfield State College
Westminster College MO
Westminster College
Westminster College of Salt Lake City
Westmont College
Wheaton College IL
Westminster College PA
Wheeling Jesuit College
Whitman College
Whittier College
Whitworth College
Widener University
Wilkes University
Willamette University
William Jewell College
William Woods University
Williams College
Wilson College
Wingate College
Winona State University
Winthrop University
Wisconsin Lutheran College
Wittenberg University
Wofford College
Worcester Polytechnic Institute
Worcester State College
Xavier University
Xavier University of Louisiana
Yale University
York College of Pennsylvania
In [ ]:
# 上面的问题就是我们需要将学校名设置为行名,其实就是obs观测

college <- read.csv("/data1/project/College.csv")

# rownnames(college) <- college[,'...1'] #这是按照逻辑索引

rownames(college) <- college[,1]  #这是按照数字下标索引,我们采用这种
# head(college)

# view(college)  # 需要在rstudio中查看,我们可以在vscode中也配置相关插件

# 我们暂时只使用head,或者是rownames来查看效果

head(college)

# rownames(college) 

college <- college[, -1] # 去除第1列

head(college)
A data.frame: 6 × 19
XPrivateAppsAcceptEnrollTop10percTop25percF.UndergradP.UndergradOutstateRoom.BoardBooksPersonalPhDTerminalS.F.Ratioperc.alumniExpendGrad.Rate
<chr><chr><int><int><int><int><int><int><int><int><int><int><int><int><int><dbl><int><int><int>
Abilene Christian UniversityAbilene Christian UniversityYes1660123272123522885 537 744033004502200707818.112 704160
Adelphi UniversityAdelphi University Yes218619245121629268312271228064507501500293012.2161052756
Adrian CollegeAdrian College Yes1428109733622501036 991125037504001165536612.930 873554
Agnes Scott CollegeAgnes Scott College Yes 417 3491376089 510 63129605450450 8759297 7.7371901659
Alaska Pacific UniversityAlaska Pacific University Yes 193 146 551644 249 869 756041208001500767211.9 21092215
Albertson CollegeAlbertson College Yes 587 4791583862 678 41135003335500 6756773 9.411 972755
A data.frame: 6 × 18
PrivateAppsAcceptEnrollTop10percTop25percF.UndergradP.UndergradOutstateRoom.BoardBooksPersonalPhDTerminalS.F.Ratioperc.alumniExpendGrad.Rate
<chr><int><int><int><int><int><int><int><int><int><int><int><int><int><dbl><int><int><int>
Abilene Christian UniversityYes1660123272123522885 537 744033004502200707818.112 704160
Adelphi UniversityYes218619245121629268312271228064507501500293012.2161052756
Adrian CollegeYes1428109733622501036 991125037504001165536612.930 873554
Agnes Scott CollegeYes 417 3491376089 510 63129605450450 8759297 7.7371901659
Alaska Pacific UniversityYes 193 146 551644 249 869 756041208001500767211.9 21092215
Albertson CollegeYes 587 4791583862 678 41135003335500 6756773 9.411 972755
In [ ]:
# 上面的方法对于df是适用的,但是我们用的是tidyverse,所以有tibble专门的列名转行名的方法
college <- read_csv("/data1/project/College.csv",col_names = TRUE)
college <- college %>% column_to_rownames(var = "...1")
# 这种方法就不用手动移除第1列了

head(college)
# rownames(college)
New names:
• `` -> `...1`
Rows: 777 Columns: 19
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): ...1, Private
dbl (17): Apps, Accept, Enroll, Top10perc, Top25perc, F.Undergrad, P.Undergr...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
A data.frame: 6 × 18
PrivateAppsAcceptEnrollTop10percTop25percF.UndergradP.UndergradOutstateRoom.BoardBooksPersonalPhDTerminalS.F.Ratioperc.alumniExpendGrad.Rate
<chr><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
Abilene Christian UniversityYes1660123272123522885 537 744033004502200707818.112 704160
Adelphi UniversityYes218619245121629268312271228064507501500293012.2161052756
Adrian CollegeYes1428109733622501036 991125037504001165536612.930 873554
Agnes Scott CollegeYes 417 3491376089 510 63129605450450 8759297 7.7371901659
Alaska Pacific UniversityYes 193 146 551644 249 869 756041208001500767211.9 21092215
Albertson CollegeYes 587 4791583862 678 41135003335500 6756773 9.411 972755
In [ ]:
# 然后就是一些基本的统计
summary(college) #基本上就是提供每一列变量列的一些统计量
   Private               Apps           Accept          Enroll    
 Length:777         Min.   :   81   Min.   :   72   Min.   :  35  
 Class :character   1st Qu.:  776   1st Qu.:  604   1st Qu.: 242  
 Mode  :character   Median : 1558   Median : 1110   Median : 434  
                    Mean   : 3002   Mean   : 2019   Mean   : 780  
                    3rd Qu.: 3624   3rd Qu.: 2424   3rd Qu.: 902  
                    Max.   :48094   Max.   :26330   Max.   :6392  
   Top10perc       Top25perc      F.Undergrad     P.Undergrad     
 Min.   : 1.00   Min.   :  9.0   Min.   :  139   Min.   :    1.0  
 1st Qu.:15.00   1st Qu.: 41.0   1st Qu.:  992   1st Qu.:   95.0  
 Median :23.00   Median : 54.0   Median : 1707   Median :  353.0  
 Mean   :27.56   Mean   : 55.8   Mean   : 3700   Mean   :  855.3  
 3rd Qu.:35.00   3rd Qu.: 69.0   3rd Qu.: 4005   3rd Qu.:  967.0  
 Max.   :96.00   Max.   :100.0   Max.   :31643   Max.   :21836.0  
    Outstate       Room.Board       Books           Personal   
 Min.   : 2340   Min.   :1780   Min.   :  96.0   Min.   : 250  
 1st Qu.: 7320   1st Qu.:3597   1st Qu.: 470.0   1st Qu.: 850  
 Median : 9990   Median :4200   Median : 500.0   Median :1200  
 Mean   :10441   Mean   :4358   Mean   : 549.4   Mean   :1341  
 3rd Qu.:12925   3rd Qu.:5050   3rd Qu.: 600.0   3rd Qu.:1700  
 Max.   :21700   Max.   :8124   Max.   :2340.0   Max.   :6800  
      PhD            Terminal       S.F.Ratio      perc.alumni   
 Min.   :  8.00   Min.   : 24.0   Min.   : 2.50   Min.   : 0.00  
 1st Qu.: 62.00   1st Qu.: 71.0   1st Qu.:11.50   1st Qu.:13.00  
 Median : 75.00   Median : 82.0   Median :13.60   Median :21.00  
 Mean   : 72.66   Mean   : 79.7   Mean   :14.09   Mean   :22.74  
 3rd Qu.: 85.00   3rd Qu.: 92.0   3rd Qu.:16.50   3rd Qu.:31.00  
 Max.   :103.00   Max.   :100.0   Max.   :39.80   Max.   :64.00  
     Expend        Grad.Rate     
 Min.   : 3186   Min.   : 10.00  
 1st Qu.: 6751   1st Qu.: 53.00  
 Median : 8377   Median : 65.00  
 Mean   : 9660   Mean   : 65.46  
 3rd Qu.:10830   3rd Qu.: 78.00  
 Max.   :56233   Max.   :118.00  
In [ ]:
college <- read.csv("/data1/project/College.csv")
rownames(college) <- college[,1]  #这是按照数字下标索引,我们采用这种
college <- college[, -1] # 去除第1列
college
A data.frame: 777 × 18
PrivateAppsAcceptEnrollTop10percTop25percF.UndergradP.UndergradOutstateRoom.BoardBooksPersonalPhDTerminalS.F.Ratioperc.alumniExpendGrad.Rate
<chr><int><int><int><int><int><int><int><int><int><int><int><int><int><dbl><int><int><int>
Abilene Christian UniversityYes 1660 1232 7212352 2885 537 74403300450220070 7818.112 7041 60
Adelphi UniversityYes 2186 1924 5121629 26831227122806450750150029 3012.21610527 56
Adrian CollegeYes 1428 1097 3362250 1036 99112503750400116553 6612.930 8735 54
Agnes Scott CollegeYes 417 349 1376089 510 63129605450450 87592 97 7.73719016 59
Alaska Pacific UniversityYes 193 146 551644 249 869 75604120800150076 7211.9 210922 15
Albertson CollegeYes 587 479 1583862 678 41135003335500 67567 73 9.411 9727 55
Albertus Magnus CollegeYes 353 340 1031745 416 230132905720500150090 9311.526 8861 63
Albion CollegeYes 1899 1720 4893768 1594 32138684826450 8508910013.73711487 73
Albright CollegeYes 1038 839 2273063 973 306155954400300 50079 8411.32311644 80
Alderson-Broaddus CollegeYes 582 498 1722144 799 78104683380660180040 4111.515 8991 52
Alfred UniversityYes 1732 1425 4723775 1830 110165485406500 60082 8811.33110932 73
Allegheny CollegeYes 2652 1900 4844477 1707 44170804440400 60073 91 9.94111711 76
Allentown Coll. of St. Francis de SalesYes 1179 780 2903864 1130 638 96904785600100060 8413.321 7940 74
Alma CollegeYes 1267 1080 3854473 1306 28125724552400 40079 8715.332 9305 68
Alverno CollegeYes 494 313 1572346 13171235 83523640650244936 6911.126 8127 55
American International CollegeYes 1420 1093 220 922 1018 287 87004780450140078 8414.719 7355 69
Amherst CollegeYes 4302 992 4188396 1593 5197605300660159893 98 8.46321424100
Anderson UniversityYes 1216 908 4231940 1819 281101003520550110048 6112.114 7994 59
Andrews UniversityYes 1130 704 3221423 1586 326 99963090900132062 6611.51810908 46
Angelo State UniversityNo 3540 200110162454 41901512 51303592500200060 6223.1 5 4010 34
Antioch UniversityYes 713 661 2522544 712 23154763336400110069 8211.33542926 48
Appalachian State UniversityNo 7313 466419102063 99401035 68062540 96200083 9618.314 5854 70
Aquinas CollegeYes 619 516 2192051 1251 767112084124350161555 6512.725 6584 65
Arizona State University Main campusNo 128091030837612449225937585 74344850700210088 9318.9 5 4602 48
Arkansas College (Lyon College)Yes 708 334 1664674 530 182 86443922500 80079 8812.62414579 54
Arkansas Tech UniversityNo 1734 1729 9511252 3602 939 34602650450100057 6019.6 5 4739 48
Assumption CollegeYes 2135 1700 4912359 1708 689120005920500 50093 9313.830 7100 88
Auburn University-Main CampusNo 7548 679130702557162621716 63003933600190885 9116.718 6642 69
Augsburg CollegeYes 662 513 2571230 2074 726119024372540 95065 6512.831 7836 58
Augustana College ILYes 1879 1658 4973669 1950 38133534173540 82178 8312.740 9220 71
⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮
Westfield State CollegeNo 31002150 825 3203234 941 554237885001300757915.720 422265
Westminster College MOYes 662 553 1842043 665 371072040506001650667012.520 792562
Westminster CollegeYes 996 866 37729581411 72120653615430 685627812.541 859680
Westminster College of Salt Lake CityYes 917 720 2132160 979 743 882040506002025688310.534 717050
Westmont CollegeNo 950 713 35142721276 91432053044901410777714.917 883787
Wheaton College ILYes 1432 920 54856842200 561148042005301400818312.7401191685
Westminster College PAYes 17381373 41721551335 30184605970700 850929613.2412270471
Wheeling Jesuit CollegeYes 903 755 2131549 971 305105004545600 600667114.127 749472
Whitman CollegeYes 1861 998 35945771220 46166704900750 800808310.5511319872
Whittier CollegeYes 16811069 34435631235 301624956995001998849213.6291177852
Whitworth CollegeYes 1121 926 37243701270 1601266045006782424808016.920 832880
Widener UniversityYes 21391492 5022464218621711235053705001350888612.619 960363
Wilkes UniversityYes 16311431 43415361803 6031115051305501260789213.324 854367
Willamette UniversityYes 16581327 39549801595 159148004620400 790919413.3371077968
William Jewell CollegeYes 663 547 31532671279 751006029705002600748011.219 788559
William Woods UniversityYes 469 435 2271739 851 1201053543655503700396612.916 743852
Williams CollegeYes 41861245 52681961988 2919629579050012009499 9.0642201499
Wilson CollegeYes 167 130 461650 199 676114285084450 4756776 8.3431029167
Wingate CollegeYes 12391017 38310341207 157 782034005501550698113.9 8 726491
Winona State UniversityNo 33252047130120455800 872 420027003001200536020.218 531858
Winthrop UniversityNo 23201805 76924613395 670 640033925802150718012.826 672959
Wisconsin Lutheran CollegeYes 152 128 751741 282 22 9100370050014004848 8.526 896050
Wittenberg UniversityYes 19791739 57542681980 144159484404400 800829512.8291041478
Wofford CollegeYes 1501 935 27351831059 341268041506051440919215.342 787575
Worcester Polytechnic InstituteYes 27682314 68249862802 86158845370530 730929415.2341077482
Worcester State CollegeNo 21971515 543 42630892029 679739005001200606021.014 446940
Xavier UniversityYes 19591805 6952447284911071152049606001250737513.331 918983
Xavier University of LouisianaYes 20971915 69534612793 166 69004200617 781677514.420 832349
Yale UniversityYes107052453131795995217 8319840651063021159696 5.8494038699
York College of PennsylvaniaYes 29891855 691286329881726 499035605001250757518.128 450999
In [ ]:
# c

# college[,1:10] 

college <- read.csv("/data1/project/College.csv")
rownames(college) <- college[,1]  #这是按照数字下标索引,我们采用这种
college <- college[, -1] # 去除第1列

# 因为private是chr列,所以建议去除private列再绘制相关性矩阵
# college[,-1][,1:10] #这样就去除了private列
pairs(college[,-1][,1:10]) # 画出pairwise scatter plot,其实就是两两变量之间的成对散点图矩阵

# vscode中无法查看,所产生图截图之后在md中展示
No description has been provided for this image

image.png

In [ ]:
# 简单绘图,绘制箱线图
boxplot(Outstate~Private,data=college) 


# 后来发现不是因为jupyter渲染的问题,而是theme的问题,选择dark的话前面以及后面绘制的图就会变成黑色的背景,所以我们需要将theme设置为light
No description has been provided for this image

image.png

后来发现不是因为jupyter渲染的问题,而是theme的问题,选择dark的话前面以及后面绘制的图就会变成黑色的背景,所以我们需要将theme设置为light

不过jupyter在网页版打开应该是一致的,能够看见

In [ ]:
# library(tidyverse)
college %>% ggplot(mapping = aes(y=Outstate, x=Private)) + geom_boxplot() 

# ggplot系列的不改theme的话,背景是白色的,所以我们可以直接使用ggplot
No description has been provided for this image
In [ ]:
Elite <- rep("No", nrow(college))  #初始化为No,college的行数次重复,数据类型同No,也是字符型
# class(rep("No", nrow(college)) ) # "character"
Elite[college$Top10perc > 50] <- "Yes" # 依赖于向量与数据帧行数的一一对应关系,实际上就是筛选index行数
Elite <- as.factor(Elite) # 转化为factor
college <- data.frame(college, Elite) # 再转化为dataframe
In [ ]:
head(college)
A data.frame: 6 × 19
PrivateAppsAcceptEnrollTop10percTop25percF.UndergradP.UndergradOutstateRoom.BoardBooksPersonalPhDTerminalS.F.Ratioperc.alumniExpendGrad.RateElite
<chr><int><int><int><int><int><int><int><int><int><int><int><int><int><dbl><int><int><int><fct>
Abilene Christian UniversityYes1660123272123522885 537 744033004502200707818.112 704160No
Adelphi UniversityYes218619245121629268312271228064507501500293012.2161052756No
Adrian CollegeYes1428109733622501036 991125037504001165536612.930 873554No
Agnes Scott CollegeYes 417 3491376089 510 63129605450450 8759297 7.7371901659Yes
Alaska Pacific UniversityYes 193 146 551644 249 869 756041208001500767211.9 21092215No
Albertson CollegeYes 587 4791583862 678 41135003335500 6756773 9.411 972755No
In [ ]:
# 当然上面的操作也可以使用tidyverse完成,比如说我们有新的1列,称之为elite
college %>% mutate(elite = ifelse(Top10perc > 50, "Yes", "No")) %>% head()
A data.frame: 6 × 20
PrivateAppsAcceptEnrollTop10percTop25percF.UndergradP.UndergradOutstateRoom.BoardBooksPersonalPhDTerminalS.F.Ratioperc.alumniExpendGrad.RateEliteelite
<chr><int><int><int><int><int><int><int><int><int><int><int><int><int><dbl><int><int><int><fct><chr>
Abilene Christian UniversityYes1660123272123522885 537 744033004502200707818.112 704160No No
Adelphi UniversityYes218619245121629268312271228064507501500293012.2161052756No No
Adrian CollegeYes1428109733622501036 991125037504001165536612.930 873554No No
Agnes Scott CollegeYes 417 3491376089 510 63129605450450 8759297 7.7371901659YesYes
Alaska Pacific UniversityYes 193 146 551644 249 869 756041208001500767211.9 21092215No No
Albertson CollegeYes 587 4791583862 678 41135003335500 6756773 9.411 972755No No
In [ ]:
summary(college$Elite)
boxplot(Outstate~Elite,college) # 建议还是使用ggplot出图
college %>% ggplot(mapping = aes(y=Outstate, x=Elite)) + geom_boxplot()
No
699
Yes
78
No description has been provided for this image
No description has been provided for this image
In [ ]:
# 然后就是直方图的展示了
par(mfrow=c(2,2))
hist(college$Apps)
hist(college$Accept)
hist(college$Enroll)
hist(college$PhD) 

# 建议使用ggplot
college %>% ggplot(mapping = aes(x=Apps)) + geom_histogram()
college %>% ggplot(mapping = aes(x=Accept)) + geom_histogram()
college %>% ggplot(mapping = aes(x=Enroll)) + geom_histogram()
college %>% ggplot(mapping = aes(x=PhD)) + geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
No description has been provided for this image
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
No description has been provided for this image
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
No description has been provided for this image
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
No description has been provided for this image
No description has been provided for this image

image.png

查看的直方图无非是申请人数,接受人数,入学人数,以及获得学校教师有无博士学位

前者逐渐递减,说明上学难,后者教育水平还可以看出来

In [ ]:
# 自由探索,其实也无非就是tidyverse对一些变量分分组,然后看其他变量统计指标,再做做检验等

# 比如说公立学校和私立学校,师资水平如何,就是含有PhD的比例
college %>% group_by(Private) %>% summarise(mean(PhD)) #私立学校的PhD比例更低,有点出乎意料

college %>% group_by(Private) %>% summarise(mean(PhD)) %>% ggplot(mapping=aes(x=Private,y=`mean(PhD)`)) + geom_bar(stat='identity') #做一个检验
# 或者用t检验
t.test(college$PhD~college$Private)
A tibble: 2 × 2
Privatemean(PhD)
<chr><dbl>
No 76.83491
Yes71.09381
	Welch Two Sample t-test

data:  college$PhD by college$Private
t = 5.1381, df = 531.86, p-value = 3.904e-07
alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
95 percent confidence interval:
 3.546110 7.936091
sample estimates:
 mean in group No mean in group Yes 
         76.83491          71.09381 
No description has been provided for this image
In [ ]:
# 我们再来看看精英学校和非精英学校的PhD比例,以及接受率(也就是申请的人数里有多少人接受了,大概这样)的差异
college %>% group_by(Elite) %>% summarise(mean(PhD)) #精英学校的PhD比例更高,这个是符合预期的

college %>% group_by(Elite) %>% summarise(mean(PhD)) %>% ggplot(mapping=aes(x=Elite,y=`mean(PhD)`)) + geom_bar(stat='identity')

college %>% group_by(Elite) %>% summarise(mean(Accept/Enroll)) #精英学校的接受率更高,这个可能是因为生源很棒,所以接受率更高
A tibble: 2 × 2
Elitemean(PhD)
<fct><dbl>
No 70.80114
Yes89.32051
A tibble: 2 × 2
Elitemean(Accept/Enroll)
<fct><dbl>
No 2.660999
Yes2.873612
No description has been provided for this image

10,

In [ ]:
# a

# install.packages("ISLR2")
library(ISLR2)
head(Boston)
# ?Boston 

# A data.frame: 506 × 13
# 其实也可以使用dim来查看
dim(Boston)
A data.frame: 6 × 13
crimzninduschasnoxrmagedisradtaxptratiolstatmedv
<dbl><dbl><dbl><int><dbl><dbl><dbl><dbl><int><dbl><dbl><dbl><dbl>
10.00632182.3100.5386.57565.24.0900129615.34.9824.0
20.02731 07.0700.4696.42178.94.9671224217.89.1421.6
30.02729 07.0700.4697.18561.14.9671224217.84.0334.7
40.03237 02.1800.4586.99845.86.0622322218.72.9433.4
50.06905 02.1800.4587.14754.26.0622322218.75.3336.2
60.02985 02.1800.4586.43058.76.0622322218.75.2128.7
  1. 506
  2. 13
In [ ]:
# b

# 又是绘制配对散点图,pairs
glimpse(Boston) # 都是dbl,数字型,那用pairs没问题了
pairs(Boston)
Rows: 506
Columns: 13
$ crim    <dbl> 0.00632, 0.02731, 0.02729, 0.03237, 0.06905, 0.02985, 0.08829,…
$ zn      <dbl> 18.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 1…
$ indus   <dbl> 2.31, 7.07, 7.07, 2.18, 2.18, 2.18, 7.87, 7.87, 7.87, 7.87, 7.…
$ chas    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ nox     <dbl> 0.538, 0.469, 0.469, 0.458, 0.458, 0.458, 0.524, 0.524, 0.524,…
$ rm      <dbl> 6.575, 6.421, 7.185, 6.998, 7.147, 6.430, 6.012, 6.172, 5.631,…
$ age     <dbl> 65.2, 78.9, 61.1, 45.8, 54.2, 58.7, 66.6, 96.1, 100.0, 85.9, 9…
$ dis     <dbl> 4.0900, 4.9671, 4.9671, 6.0622, 6.0622, 6.0622, 5.5605, 5.9505…
$ rad     <int> 1, 2, 2, 3, 3, 3, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4,…
$ tax     <dbl> 296, 242, 242, 222, 222, 222, 311, 311, 311, 311, 311, 311, 31…
$ ptratio <dbl> 15.3, 17.8, 17.8, 18.7, 18.7, 18.7, 15.2, 15.2, 15.2, 15.2, 15…
$ lstat   <dbl> 4.98, 9.14, 4.03, 2.94, 5.33, 5.21, 12.43, 19.15, 29.93, 17.10…
$ medv    <dbl> 24.0, 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15…
No description has been provided for this image

image.png

image.png

In [ ]:
# c

# 事实上,上面的图太糊了,我们最好是挑选一些var变量来查看,并且最好使用ggplot
# 我们可以挑选一些指标,比如说是crim和dis、rad、ptratio等

Boston %>% ggplot(mapping=aes(x = dis,y = crim))+geom_point() #这个是dis和crim的散点图,说明离开就业中心越远也就是越不发达的地区,犯罪率越低
Boston %>% ggplot(mapping=aes(x = rad,y = crim))+geom_point() # 这个是rad和crim的散点图,没啥规律,但是交通越发达地区,犯罪率一般很高
Boston %>% ggplot(mapping=aes(x = ptratio,y = crim))+geom_point() #生师比高的地方,也就是教育资源分配不均的地方,犯罪率也高

# 有很多变量都可以拿来和crim比较查看
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [ ]:
# ?Boston #  ‘crim’ per capita crime rate by town. 城镇的人均犯罪率
# particularly high crime rates,怎么衡量,翻过一件罪行的认为是高犯罪率,那么我们可以看看crim的分布
head(Boston) # 506 suburbs of Boston 波士顿506个郊区
A data.frame: 6 × 13
crimzninduschasnoxrmagedisradtaxptratiolstatmedv
<dbl><dbl><dbl><int><dbl><dbl><dbl><dbl><int><dbl><dbl><dbl><dbl>
10.00632182.3100.5386.57565.24.0900129615.34.9824.0
20.02731 07.0700.4696.42178.94.9671224217.89.1421.6
30.02729 07.0700.4697.18561.14.9671224217.84.0334.7
40.03237 02.1800.4586.99845.86.0622322218.72.9433.4
50.06905 02.1800.4587.14754.26.0622322218.75.3336.2
60.02985 02.1800.4586.43058.76.0622322218.75.2128.7
In [ ]:
# d

Boston %>% ggplot(mapping=aes(x = crim))+geom_histogram()
Boston %>% ggplot(mapping=aes(x = crim))+geom_histogram(bins = 10) # 大部分郊区还是低犯罪率的,但是有些郊区数值在75以上,人均了

Boston %>% ggplot(mapping=aes(x = tax))+geom_histogram() #税率普遍比较低

Boston %>% ggplot(mapping=aes(x = ptratio))+geom_histogram() #生师比普遍比较高,其实我们应该看师生比,也就是学生多少个老师,这个比例更好
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
No description has been provided for this image
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
No description has been provided for this image
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
No description has been provided for this image
No description has been provided for this image
In [ ]:
# e
Boston %>% filter(chas==1) %>% nrow() # 一共是35个地方
35
In [ ]:
# f

Boston %>% select(ptratio) %>% summary() # 可以看得出来median是19.05,最大值是22,最小值是12.6,所以这个指标还是比较均衡的
    ptratio     
 Min.   :12.60  
 1st Qu.:17.40  
 Median :19.05  
 Mean   :18.46  
 3rd Qu.:20.20  
 Max.   :22.00  
In [ ]:
# g

# ?Boston ‘medv’ median value of owner-occupied homes in $1000s. 自有住房价值中位数
Boston %>% filter(medv == min(medv)) #有2个城市,但是行index应该可以通过其他方法查看

subset(Boston,medv==min(Boston$medv)) # 可以看出来是第399和406行
A data.frame: 2 × 13
crimzninduschasnoxrmagedisradtaxptratiolstatmedv
<dbl><dbl><dbl><int><dbl><dbl><dbl><dbl><int><dbl><dbl><dbl><dbl>
38.3518018.100.6935.4531001.48962466620.230.595
67.9208018.100.6935.6831001.42542466620.222.985
A data.frame: 2 × 13
crimzninduschasnoxrmagedisradtaxptratiolstatmedv
<dbl><dbl><dbl><int><dbl><dbl><dbl><dbl><int><dbl><dbl><dbl><dbl>
39938.3518018.100.6935.4531001.48962466620.230.595
40667.9208018.100.6935.6831001.42542466620.222.985
In [ ]:
# h

# ?Boston #  ‘rm’ average number of rooms per dwelling. 每套住宅的平均房间数
Boston %>% filter(rm>7) # 64个是超过7个房间的
Boston %>% filter(rm>8) # 13个是超过8个房间的

# 然后就是一些统计
Boston %>% filter(rm>8) %>% summary()
# 可以和全面的数据比较
Boston %>% summary()

# 我们可以只比较mean均值
# crim更低,也就是犯罪率低,其他指标同样可以比较均值看出
A data.frame: 64 × 13
crimzninduschasnoxrmagedisradtaxptratiolstatmedv
<dbl><dbl><dbl><int><dbl><dbl><dbl><dbl><int><dbl><dbl><dbl><dbl>
0.02729 0.0 7.0700.46907.18561.14.9671224217.84.0334.7
0.06905 0.0 2.1800.45807.14754.26.0622322218.75.3336.2
0.0335975.0 2.9500.42807.02415.85.4011325218.31.9834.9
0.0131190.0 1.2200.40307.24921.98.6966522617.94.8135.4
0.0195117.5 1.3800.41617.10459.59.2229321618.68.0533.0
0.05660 0.0 3.4100.48907.00786.33.4217227017.85.5023.6
0.05302 0.0 3.4100.48907.07963.13.4145227017.85.7028.7
0.12083 0.0 2.8900.44508.06976.03.4952227618.04.2138.7
0.08187 0.0 2.8900.44507.82036.93.4952227618.03.5743.8
0.06860 0.0 2.8900.44507.41662.53.4952227618.06.1933.2
1.46336 0.019.5800.60507.48990.81.9709540314.71.7350.0
1.83377 0.019.5810.60507.80298.22.0407540314.71.9250.0
1.51902 0.019.5810.60508.37593.92.1620540314.73.3250.0
2.01019 0.019.5800.60507.92996.22.0459540314.73.7050.0
0.06588 0.0 2.4600.48807.76583.32.7410319317.87.5639.8
0.09103 0.0 2.4600.48807.15592.22.7006319317.84.8237.9
0.05602 0.0 2.4600.48807.83153.63.1992319317.84.4550.0
0.0837045.0 3.4400.43707.18538.94.5667539815.25.3934.9
0.0866445.0 3.4400.43707.17826.36.4798539815.22.8736.4
0.0138180.0 0.4600.42207.87532.05.6484425514.42.9750.0
0.0401180.0 1.5200.40407.28734.17.3090232912.64.0833.3
0.0466680.0 1.5200.40407.10736.67.3090232912.68.6130.3
0.0376880.0 1.5200.40407.27438.37.3090232912.66.6234.6
0.0177895.0 1.4700.40307.13513.97.6534340217.04.4532.9
0.0217782.5 2.0300.41507.61015.76.2700234814.73.1142.3
0.0351095.0 2.6800.41617.85333.25.1180422414.73.8148.5
0.0200995.0 2.6800.41618.03431.95.1180422414.72.8850.0
0.31533 0.0 6.2000.50408.26678.32.8944830717.44.1444.8
0.52693 0.0 6.2000.50408.72583.02.8944830717.44.6350.0
0.38214 0.0 6.2000.50408.04086.53.2157830717.43.1337.6
⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮
0.33147 0 6.2000.50708.247 70.43.6519 830717.4 3.9548.3
0.51183 0 6.2000.50707.358 71.64.1480 830717.4 4.7331.5
0.3689422 5.8600.43108.259 8.48.9067 733019.1 3.5442.8
0.0153890 3.7500.39407.454 34.26.3361 324415.9 3.1144.0
0.6115420 3.9700.64708.704 86.91.8010 526413.0 5.1250.0
0.6635120 3.9700.64707.333100.01.8946 526413.0 7.7936.0
0.5401120 3.9700.64707.203 81.82.1121 526413.0 9.5933.8
0.5341220 3.9700.64707.520 89.42.1398 526413.0 7.2643.1
0.5201420 3.9700.64708.398 91.52.2885 526413.0 5.9148.8
0.8252620 3.9700.64707.327 94.52.0788 526413.011.2531.0
0.5500720 3.9700.64707.206 91.61.9301 526413.0 8.1036.5
0.7857020 3.9700.64707.014 84.62.1329 526413.014.7930.7
0.5783420 3.9700.57508.297 67.02.4216 526413.0 7.4450.0
0.5405020 3.9700.57507.470 52.62.8720 526413.0 3.1643.5
0.2218820 6.9610.46407.691 51.84.3665 322318.6 6.5835.2
0.1046940 6.4110.44707.267 49.04.7872 425417.6 6.0533.2
0.0357820 3.3300.44297.820 64.54.6947 521614.9 3.7645.4
0.0612920 3.3310.44297.645 49.75.2119 521614.9 3.0146.0
0.0150190 1.2110.40107.923 24.85.8850 119813.6 3.1650.0
0.0090690 2.9700.40007.088 20.87.3073 128515.3 7.8532.2
0.0788680 4.9500.41107.148 27.75.1167 424519.2 3.5637.3
0.0556170 2.2400.40007.041 10.07.8278 535814.8 4.7429.0
0.0551533 2.1800.47207.236 41.14.0220 722218.4 6.9336.1
0.0750333 2.1800.47207.420 71.93.0992 722218.4 6.4733.4
0.0130135 1.5200.44207.241 49.37.0379 128415.5 5.4932.7
3.47428 018.1010.71808.780 82.91.90472466620.2 5.2921.9
6.53876 018.1010.63107.016 97.51.20242466620.2 2.9650.0
19.60910 018.1000.67107.313 97.91.31632466620.213.4415.0
8.24809 018.1000.71307.393 99.32.45272466620.216.7417.8
5.73116 018.1000.53207.061 77.03.41062466620.2 7.0125.0
A data.frame: 13 × 13
crimzninduschasnoxrmagedisradtaxptratiolstatmedv
<dbl><dbl><dbl><int><dbl><dbl><dbl><dbl><int><dbl><dbl><dbl><dbl>
0.12083 0 2.8900.44508.06976.03.4952 227618.04.2138.7
1.51902 019.5810.60508.37593.92.1620 540314.73.3250.0
0.0200995 2.6800.41618.03431.95.1180 422414.72.8850.0
0.31533 0 6.2000.50408.26678.32.8944 830717.44.1444.8
0.52693 0 6.2000.50408.72583.02.8944 830717.44.6350.0
0.38214 0 6.2000.50408.04086.53.2157 830717.43.1337.6
0.57529 0 6.2000.50708.33773.33.8384 830717.42.4741.7
0.33147 0 6.2000.50708.24770.43.6519 830717.43.9548.3
0.3689422 5.8600.43108.259 8.48.9067 733019.13.5442.8
0.6115420 3.9700.64708.70486.91.8010 526413.05.1250.0
0.5201420 3.9700.64708.39891.52.2885 526413.05.9148.8
0.5783420 3.9700.57508.29767.02.4216 526413.07.4450.0
3.47428 018.1010.71808.78082.91.90472466620.25.2921.9
      crim               zn            indus             chas       
 Min.   :0.02009   Min.   : 0.00   Min.   : 2.680   Min.   :0.0000  
 1st Qu.:0.33147   1st Qu.: 0.00   1st Qu.: 3.970   1st Qu.:0.0000  
 Median :0.52014   Median : 0.00   Median : 6.200   Median :0.0000  
 Mean   :0.71879   Mean   :13.62   Mean   : 7.078   Mean   :0.1538  
 3rd Qu.:0.57834   3rd Qu.:20.00   3rd Qu.: 6.200   3rd Qu.:0.0000  
 Max.   :3.47428   Max.   :95.00   Max.   :19.580   Max.   :1.0000  
      nox               rm             age             dis       
 Min.   :0.4161   Min.   :8.034   Min.   : 8.40   Min.   :1.801  
 1st Qu.:0.5040   1st Qu.:8.247   1st Qu.:70.40   1st Qu.:2.288  
 Median :0.5070   Median :8.297   Median :78.30   Median :2.894  
 Mean   :0.5392   Mean   :8.349   Mean   :71.54   Mean   :3.430  
 3rd Qu.:0.6050   3rd Qu.:8.398   3rd Qu.:86.50   3rd Qu.:3.652  
 Max.   :0.7180   Max.   :8.780   Max.   :93.90   Max.   :8.907  
      rad              tax           ptratio          lstat           medv     
 Min.   : 2.000   Min.   :224.0   Min.   :13.00   Min.   :2.47   Min.   :21.9  
 1st Qu.: 5.000   1st Qu.:264.0   1st Qu.:14.70   1st Qu.:3.32   1st Qu.:41.7  
 Median : 7.000   Median :307.0   Median :17.40   Median :4.14   Median :48.3  
 Mean   : 7.462   Mean   :325.1   Mean   :16.36   Mean   :4.31   Mean   :44.2  
 3rd Qu.: 8.000   3rd Qu.:307.0   3rd Qu.:17.40   3rd Qu.:5.12   3rd Qu.:50.0  
 Max.   :24.000   Max.   :666.0   Max.   :20.20   Max.   :7.44   Max.   :50.0  
      crim                zn             indus            chas        
 Min.   : 0.00632   Min.   :  0.00   Min.   : 0.46   Min.   :0.00000  
 1st Qu.: 0.08205   1st Qu.:  0.00   1st Qu.: 5.19   1st Qu.:0.00000  
 Median : 0.25651   Median :  0.00   Median : 9.69   Median :0.00000  
 Mean   : 3.61352   Mean   : 11.36   Mean   :11.14   Mean   :0.06917  
 3rd Qu.: 3.67708   3rd Qu.: 12.50   3rd Qu.:18.10   3rd Qu.:0.00000  
 Max.   :88.97620   Max.   :100.00   Max.   :27.74   Max.   :1.00000  
      nox               rm             age              dis        
 Min.   :0.3850   Min.   :3.561   Min.   :  2.90   Min.   : 1.130  
 1st Qu.:0.4490   1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100  
 Median :0.5380   Median :6.208   Median : 77.50   Median : 3.207  
 Mean   :0.5547   Mean   :6.285   Mean   : 68.57   Mean   : 3.795  
 3rd Qu.:0.6240   3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188  
 Max.   :0.8710   Max.   :8.780   Max.   :100.00   Max.   :12.127  
      rad              tax           ptratio          lstat      
 Min.   : 1.000   Min.   :187.0   Min.   :12.60   Min.   : 1.73  
 1st Qu.: 4.000   1st Qu.:279.0   1st Qu.:17.40   1st Qu.: 6.95  
 Median : 5.000   Median :330.0   Median :19.05   Median :11.36  
 Mean   : 9.549   Mean   :408.2   Mean   :18.46   Mean   :12.65  
 3rd Qu.:24.000   3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:16.95  
 Max.   :24.000   Max.   :711.0   Max.   :22.00   Max.   :37.97  
      medv      
 Min.   : 5.00  
 1st Qu.:17.02  
 Median :21.20  
 Mean   :22.53  
 3rd Qu.:25.00  
 Max.   :50.00  

¶